NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust Offline Reinforcement Learning with Linearly Structured $$f$$-Divergence Regularization

Tang, Cheng; Liu, Zhishuai; Xu, Pan (May 2025, Proceedings of the 42nd International Conference on Machine Learning)

The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured regularization, potentially leading to conservative policies under unrealistic transitions. To address this limitation, we propose a novel framework, the $$d$$-rectangular linear RRMDP ($$d$$-RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employs linear function approximation for robust policy learning in $$d$$-RRMDPs with $$f$$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, demonstrating that these bounds are influenced by how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. We establish information-theoretic lower bounds to verify that our algorithm is near-optimal. Finally, numerical experiments validate that R2PVI learns robust policies and exhibits superior computational efficiency compared to baseline methods.
more » « less
Free, publicly-accessible full text available May 1, 2026
Robust matrix estimations meet Frank–Wolfe algorithm

https://doi.org/10.1007/s10994-023-06325-w

Jing, Naimin; Fang, Ethan X; Tang, Cheng Yong (July 2023, Machine Learning)

Full Text Available
High-dimensional empirical likelihood inference

https://doi.org/10.1093/biomet/asaa051

Chang, Jinyuan; Chen, Song Xi; Tang, Cheng Yong; Wu, Tong Tong (October 2020, Biometrika)
null (Ed.)
Summary High-dimensional statistical inference with general estimating equations is challenging and remains little explored. We study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications tests. First, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. Second, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. Numerical studies demonstrate promising performance and potential practical benefits of the new methods.
more » « less
Full Text Available
Disentangling and assessing uncertainties in multiperiod corporate default risk predictions

https://doi.org/10.1214/18-AOAS1170

Yuan, Miao; Tang, Cheng Yong; Hong, Yili; Yang, Jian (December 2018, The Annals of Applied Statistics)

Full Text Available
Convergence Rate of Stochastic k-means

Tang, Cheng; Monteleoni, Claire (January 2017, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics)

We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate O(1/t) under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate O(1/t) with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.
more » « less
Full Text Available

Search for: All records